Health Data Science with R

Introduction

Welcome! In this interactive tutorial you will see how to use data science skills to analyses maternal and child health. Specifically you will be exploring birthweight and factors that can lead to low birthweight using the R statistical software.

Why Study Birthweight?

Birthweight is a crucial indicator of a newborn’s health and well-being. It serves as a fundamental metric in assessing a baby’s initial growth and development. Moreover, birthweight plays a pivotal role in predicting the infant’s short-term and long-term health outcomes. Babies born with low birthweight, typically defined as weighing less than 2,500 grams (5.5 pounds) at birth, face increased risks of complications, developmental issues, and chronic health conditions.

created by Stable Diffusion

Factors Influencing Low Birthweight

There are many factors that may contribute to low birthweight:

  • Maternal Nutrition: Adequate maternal nutrition is paramount for the proper growth and development of the fetus. Poor maternal nutrition, whether due to malnutrition or inadequate dietary intake, can result in low birthweight.

  • Maternal Health Conditions: Certain maternal health conditions, such as hypertension, diabetes, and infections, can impact fetal growth and contribute to low birthweight. Managing and treating these conditions during pregnancy is crucial for the well-being of both the mother and the baby.

  • Lifestyle Factors: Maternal lifestyle choices, including smoking, alcohol consumption, and illicit drug use, have been linked to low birthweight. These substances can negatively affect fetal development and increase the risk of complications.

  • Socioeconomic Factors: Socioeconomic status is a significant determinant of maternal and child health. Limited access to healthcare, education, and resources can contribute to low birthweight. Understanding these social determinants allows for targeted interventions to address disparities.

  • Multiple Pregnancies: Twins, triplets, or other multiple pregnancies are at a higher risk of low birthweight due to the shared resources in the womb.

Birthweight in context

In New South Wales, data on birthweight are routinely recorded in the Perinatal Data Collection, a population-based surveillance system covering all births in NSW public and private hospitals, as well as home births. It encompasses all live births, and stillbirths of at least 20 weeks gestation or at least 400 grams birthweight.

Birthweight statistics are regularly reported on, for example in the annual Mothers and Babies reports, produced by NSW Health. Birthweight is also the subject of numerous academic studies, for example, the recent journal article Smoking Cessation during the Second Half of Pregnancy Prevents Low Birth Weight among Australian Born Babies in Regional New South Wales1

Learn about how data are generated and used in the Australian health system in the course HDAT9100 Context for Health Data Science.

Test your understanding

True or False? A boy born weighing 2.2kg would be classified as low birthweight?

Which of the following is not a risk factor for low birthweight?

Exploratory data analysis

library(MASS) # Includes the birthweight dataset
library(dplyr) # Tools for manipulating data

birthwt |> 
  select(age, smoke, ht, bwt) |> 
  head()
   age smoke ht  bwt
85  19     0  0 2523
86  33     0  0 2551
87  20     1  0 2557
88  21     1  0 2594
89  18     1  0 2600
91  21     0  0 2622
  • age Mother’s age in years
  • smoke Smoking status during pregnancy (0=No, 1 = Yes)
  • ht History of hypertension (0=No, 1 = Yes)
  • bwt Birthweight in grams.
birthwt |> 
  select(age, smoke, ht, bwt) |> 
  summary()
      age            smoke              ht               bwt      
 Min.   :14.00   Min.   :0.0000   Min.   :0.00000   Min.   : 709  
 1st Qu.:19.00   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:2414  
 Median :23.00   Median :0.0000   Median :0.00000   Median :2977  
 Mean   :23.24   Mean   :0.3915   Mean   :0.06349   Mean   :2945  
 3rd Qu.:26.00   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:3487  
 Max.   :45.00   Max.   :1.0000   Max.   :1.00000   Max.   :4990  

For example, we can see that the median maternal age is 23 years and 39% of mums in this dataset smoked smoked during pregnancy.

Test your understanding

True or False? The oldest maternal age recorded was 40 years?

The median maternal age was grams.

birthwt |> 
  group_by(smoke) |> 
  summarise(mean = mean(bwt))
# A tibble: 2 × 2
  smoke  mean
  <int> <dbl>
1     0 3056.
2     1 2772.

True or False? Children born to mothers with a history of hypertension have lower birthweights on average?

The difference in birthweight between children born to mothers with and without hypertension (to the nearest gram) is grams.


created by Stable Diffusion

Data Visualisaion

Code
library(ggplot2) # Tools for visualising data

birthwt |> 
  mutate(
    smokeCategorical = factor(smoke, 
                               labels = c('Non-smoker', 'Smoker')
                               )) |> 
ggplot(
  aes(x = smokeCategorical, y = bwt)) +
    geom_boxplot() +
      scale_x_discrete("") +
      scale_y_continuous("Birthweight (grams)", labels = scales::comma) +
      labs(title="Birthweight by maternal smoking status") +
      theme_minimal()

Code
birthwt |> 
  filter(age <= 40) |> 
  mutate(
    smokeCategorical = factor(smoke, 
                               labels = c('Non-smoker', 'Smoker')
                               )) |> 
ggplot(
  aes(x = age, y = bwt, color = smokeCategorical, fill = smokeCategorical, shape = smokeCategorical)) +
    geom_point() +
    geom_smooth(method = 'lm') +
      scale_x_continuous("Maternal age (years)") +
      scale_y_continuous("Birthweight (grams)", labels = scales::comma) +
      scale_shape_manual("Smoking status", values = c(21, 22)) +
      scale_color_manual("Smoking status", values = c('#03d77f', '#fb706a')) +
      scale_fill_manual("Smoking status", values = lighten(c('#03d77f', '#fb706a'), 0.4)) +
      labs(title="Birthweight by maternal age and maternal smoking status") +
      theme_minimal() +
      theme(legend.position = 'top')

Test your understanding

Which R package provides tools for data visualisation?

Which statement is most accurate based on the figure above?

Exercise

Solution

created by Stable Diffusion

Statistical Modelling

model1 <- lm(bwt ~ smoke, data = birthwt)

library(sjPlot)
tab_model(model1, digits = 0, title = 'Birthweight')
Birthweight
  bwt
Predictors Estimates CI p
(Intercept) 3056 2924 – 3188 <0.001
smoke -284 -495 – -73 0.009
Observations 189
R2 / R2 adjusted 0.036 / 0.031

We can interpret this as follows:

  • The average birthweight among babies born to non-smokers was 3,056 grams.
  • The 95% confidence interval (CI) for this estimate ranges from 2,294 grams to 3,188 grams. This is the range of values within which we are 95% confident that the true population coefficient lies. In other words, if you were to conduct the same study multiple times and calculate a 95% confidence interval for the coefficient for non-smokers each time, you would expect the true coefficient to fall within the range 2,294–3,188 grams in 95% of those intervals.
  • The average birthweight among babies born to smokers was 284 grams less than babies born to non-smokers.
  • The 95% confidence interval (CI) for this estimate ranges from -495 grams to -73 grams. This is the range of values within which we are 95% confident that the true population coefficient lies. In other words, if you were to conduct the same study multiple times and calculate a 95% confidence interval for the coefficient for smokers each time, you would expect the true coefficient to fall within the range -495—73 grams in 95% of those intervals.

Test your understanding

Birthweight
  bwt
Predictors Estimates CI p
(Intercept) 2791 2316 – 3267 <0.001
smoke -278 -489 – -67 0.010
age 11 -8 – 31 0.255
Observations 189
R2 / R2 adjusted 0.043 / 0.033

The estimated coefficient for maternal age is

True or False? The 95% confidence interval for the coefficient of age includes 0

Footnotes

  1. Ghimire, P.R.; Mooney, J.; Fox, L.; Dubois, L. Smoking Cessation during the Second Half of Pregnancy Prevents Low Birth Weight among Australian Born Babies in Regional New South Wales. Int. J. Environ. Res. Public Health 2021, 18, 3417. https://doi.org/10.3390/ijerph18073417↩︎